AITopics | optimization perspective

Collaborating Authors

optimization perspective

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d61819e9b4a607b8448de762235148c4-Paper-Conference.pdf

Neural Information Processing SystemsJun-22-2026, 21:58:22 GMT

Leveraging the lottery ticket hypothesis, novel training GMV pipeline, activates which diverse includes sub-net mix works ed-vie within w generation, a single GNN and multi-vie through w a decomposition and learning. This approach simultaneously broadens "views" from the data, model, and optimization perspectives during training to enhance the generalization additional prediction capabilities heads of into GNNs.

artificial intelligence, information management, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.93)
Contests & Prizes (0.68)

Industry:

Education > Educational Setting (0.46)
Information Technology (0.46)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Information Management (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

TD Convergence: An Optimization Perspective

Neural Information Processing SystemsDec-26-2025, 10:09:35 GMT

We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.

name change, optimization perspective, td convergence, (2 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback

Transformers from an Optimization Perspective

Neural Information Processing SystemsDec-25-2025, 16:22:57 GMT

name change, optimization perspective, transformer, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Gradient Guidance for Diffusion Models: An Optimization Perspective

Neural Information Processing SystemsMay-27-2025, 11:33:08 GMT

Diffusion models have demonstrated empirical successes in various applications and can be adapted to task-specific needs via guidance. This paper studies a form of gradient guidance for adapting a pre-trained diffusion model towards optimizing user-specified objectives. We establish a mathematical framework for guided diffusion to systematically study its optimization theory and algorithmic design. Our theoretical analysis spots a strong link between guided diffusion models and optimization: gradient-guided diffusion models are essentially sampling solutions to a regularized optimization problem, where the regularization is imposed by the pre-training data. As for guidance design, directly bringing in the gradient of an external objective function as guidance would jeopardize the structure in generated samples.

diffusion model, gradient guidance, optimization perspective, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.74)

Add feedback

Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

Liu, Junnan, Liu, Hongwei, Xiao, Linchen, Liu, Shudong, Zhang, Taolin, Ma, Zihan, Zhang, Songyang, Chen, Kai

arXiv.org Artificial IntelligenceMay-27-2025

We propose a novel framework for comprehending the reasoning capabilities of large language models (LLMs) through the perspective of meta-learning. By conceptualizing reasoning trajectories as pseudo-gradient descent updates to the LLM's parameters, we identify parallels between LLM reasoning and various meta-learning paradigms. We formalize the training process for reasoning tasks as a meta-learning setup, with each question treated as an individual task, and reasoning trajectories serving as the inner loop optimization for adapting model parameters. Once trained on a diverse set of questions, the LLM develops fundamental reasoning capabilities that can generalize to previously unseen questions. Extensive empirical evaluations substantiate the strong connection between LLM reasoning and meta-learning, exploring several issues of significant interest from a meta-learning standpoint. Our work not only enhances the understanding of LLM reasoning but also provides practical insights for improving these models through established meta-learning techniques.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.19815

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TD Convergence: An Optimization Perspective

Neural Information Processing SystemsJan-19-2025, 16:45:00 GMT

algorithm, optimization perspective, td convergence

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

Transformers from an Optimization Perspective

Neural Information Processing SystemsJan-19-2025, 06:24:54 GMT

Deep learning models such as the Transformer are often constructed by heuristics and experience. To provide a complementary foundation, in this work we study the following problem: Is it possible to find an energy function underlying the Transformer model, such that descent steps along this energy correspond with the Transformer forward pass? This unfolding perspective has been frequently adopted in the past to elucidate more straightforward deep models such as MLPs and CNNs; however, it has thus far remained elusive obtaining a similar equivalence for more complex models with self-attention mechanisms like the Transformer. To this end, we first outline several major obstacles before providing companion techniques to at least partially address them, demonstrating for the first time a close association between energy function minimization and deep layers with self-attention. This interpretation contributes to our intuition and understanding of Transformers, while potentially laying the ground-work for new model designs.

optimization perspective, transformer

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective

Knauer, Ricardo, Rodner, Erik

arXiv.org Artificial IntelligenceOct-9-2023

A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.

best subset selection, selection, subset selection, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-42608-7_10

2310.05464

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Learning Transferrable Parameters for Long-tailed Sequential User Behavior Modeling

Yin, Jianwen, Liu, Chenghao, Wang, Weiqing, Sun, Jianling, Hoi, Steven C. H.

arXiv.org Artificial IntelligenceNov-1-2020

Sequential user behavior modeling plays a crucial role in online user-oriented services, such as product purchasing, news feed consumption, and online advertising. The performance of sequential modeling heavily depends on the scale and quality of historical behaviors. However, the number of user behaviors inherently follows a long-tailed distribution, which has been seldom explored. In this work, we argue that focusing on tail users could bring more benefits and address the long tails issue by learning transferrable parameters from both optimization and feature perspectives. Specifically, we propose a gradient alignment optimizer and adopt an adversarial training scheme to facilitate knowledge transfer from the head to the tail. Such methods can also deal with the cold-start problem of new users. Moreover, it could be directly adaptive to various well-established sequential models. Extensive experiments on four real-world datasets verify the superiority of our framework compared with the state-of-the-art baselines.

artificial intelligence, machine learning, tail user, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3394486.3403078

2010.11401

Country:

Asia > Singapore (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.48)
Leisure & Entertainment (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

In defense of weight-sharing for neural architecture search: an optimization perspective

AIHubJul-27-2020, 11:59:16 GMT

Neural architecture search (NAS) -- selecting which neural model to use for your learning problem -- is a promising but computationally expensive direction for automating and democratizing machine learning. The weight-sharing method, whose initial success at dramatically accelerating NAS surprised many in the field, has come under scrutiny due to its poor performance as a surrogate for full model-training (a miscorrelation problem known as rank disorder) and inconsistent results on recent benchmarks. In this post, we give a quick overview of weight-sharing and argue in favor of its continued use for NAS. First-generation NAS methods were astronomically expensive due to the combinatorially large search space, requiring the training of thousands of neural networks to completion. Then, in their 2018 ENAS (for Efficient NAS) paper, Pham et al. introduced the idea of weight-sharing, in which only one shared set of model parameters is trained for all architectures.

artificial intelligence, machine learning, optimization problem, (17 more...)

AIHub

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.85)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.60)

Add feedback